Search CORE

Validation and functional annotation of expression-based clusters based on gene ontology

Author: Humburg Peter
Selbig Joachim
Steuer Ralf
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The biological interpretation of large-scale gene expression data is one of the paramount challenges in current bioinformatics. In particular, placing the results in the context of other available functional genomics data, such as existing bio-ontologies, has already provided substantial improvement for detecting and categorizing genes of interest. One common approach is to look for functional annotations that are significantly enriched within a group or cluster of genes, as compared to a reference group. RESULTS: In this work, we suggest the information-theoretic concept of mutual information to investigate the relationship between groups of genes, as given by data-driven clustering, and their respective functional categories. Drawing upon related approaches (Gibbons and Roth, Genome Research 12:1574-1581, 2002), we seek to quantify to what extent individual attributes are sufficient to characterize a given group or cluster of genes. CONCLUSION: We show that the mutual information provides a systematic framework to assess the relationship between groups or clusters of genes and their functional annotations in a quantitative way. Within this framework, the mutual information allows us to address and incorporate several important issues, such as the interdependence of functional annotations and combinatorial combinations of attributes. It thus supplements and extends the conventional search for overrepresented attributes within a group or cluster of genes. In particular taking combinations of attributes into account, the mutual information opens the way to uncover specific functional descriptions of a group of genes or clustering result. All datasets and functional annotations used in this study are publicly available. All scripts used in the analysis are provided as additional files

Network Structure and Biological Function: Reconstruction, Modeling, and Statistical Approaches

Author: Repsilber Dirk
Selbig Joachim
Steinfath Matthias
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen (hbz)

Mass-balanced randomization of metabolic networks

Author: Basler Georg
Ebenhöh Oliver
Nikoloski Zoran
Selbig Joachim
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Motivation: Network-centered studies in systems biology attempt to integrate the topological properties of biological networks with experimental data in order to make predictions and posit hypotheses. For any topology-based prediction, it is necessary to first assess the significance of the analyzed property in a biologically meaningful context. Therefore, devising network null models, carefully tailored to the topological and biochemical constraints imposed on the network, remains an important computational problem

CiteSeerX

Species-specific analysis of protein sequence motifs using mutual information

Author: Hummel Jan
Keshvari Nima
Selbig Joachim
Weckwerth Wolfram
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Protein sequence motifs are by definition short fragments of conserved amino acids, often associated with a specific function. Accordingly protein sequence profiles derived from multiple sequence alignments provide an alternative description of functional motifs characterizing families of related sequences. Such profiles conveniently reflect functional necessities by pointing out proximity at conserved sequence positions as well as depicting distances at variable positions. Discovering significant conservation characteristics within the variable positions of profiles mirrors group-specific and, in particular, evolutionary features of the underlying sequences. RESULTS: We describe the tool PROfile analysis based on Mutual Information (PROMI) that enables comparative analysis of user-classified protein sequences. PROMI is implemented as a web service using Perl and R as well as other publicly available packages and tools on the server-side. On the client-side platform-independence is achieved by generally applied internet delivery standards. As one possible application analysis of the zinc finger C(2)H(2)-type protein domain is introduced to illustrate the functionality of the tool. CONCLUSION: The web service PROMI should assist researchers to detect evolutionary correlations in protein profiles of defined biological sequences. It is available at where additional documentation can be found

Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data

Author: Daub Carsten O
Kloska Sebastian
Selbig Joachim
Steuer Ralf
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: The information theoretic concept of mutual information provides a general framework to evaluate dependencies between variables. In the context of the clustering of genes with similar patterns of expression it has been suggested as a general quantity of similarity to extend commonly used linear measures. Since mutual information is defined in terms of discrete variables, its application to continuous data requires the use of binning procedures, which can lead to significant numerical errors for datasets of small or moderate size. RESULTS: In this work, we propose a method for the numerical estimation of mutual information from continuous data. We investigate the characteristic properties arising from the application of our algorithm and show that our approach outperforms commonly used algorithms: The significance, as a measure of the power of distinction from random correlation, is significantly increased. This concept is subsequently illustrated on two large-scale gene expression datasets and the results are compared to those obtained using other similarity measures. A C++ source code of our algorithm is available for non-commercial use from [email protected] upon request. CONCLUSION: The utilisation of mutual information as similarity measure enables the detection of non-linear correlations in gene expression datasets. Frequently applied linear correlation measures, which are often used on an ad-hoc basis without further justification, are thereby extended

Evolutionary significance of metabolic network properties

Author: Basler Georg
Ebenhöh Oliver
Grimbs Sergio
Nikoloski Zoran
Selbig Joachim
Publication venue: The Royal Society
Publication date: 01/01/2011
Field of study

Complex networks have been successfully employed to represent different levels of biological systems, ranging from gene regulation to protein–protein interactions and metabolism. Network-based research has mainly focused on identifying unifying structural properties, such as small average path length, large clustering coefficient, heavy-tail degree distribution and hierarchical organization, viewed as requirements for efficient and robust system architectures. However, for biological networks, it is unclear to what extent these properties reflect the evolutionary history of the represented systems. Here, we show that the salient structural properties of six metabolic networks from all kingdoms of life may be inherently related to the evolution and functional organization of metabolism by employing network randomization under mass balance constraints. Contrary to the results from the common Markov-chain switching algorithm, our findings suggest the evolutionary importance of the small-world hypothesis as a fundamental design principle of complex networks. The approach may help us to determine the biologically meaningful properties that result from evolutionary pressure imposed on metabolism, such as the global impact of local reaction knockouts. Moreover, the approach can be applied to test to what extent novel structural properties can be used to draw biologically meaningful hypothesis or predictions from structure alone

The stability and robustness of metabolic states: identifying stabilizing sites in metabolic networks

Author: Hermann‐Georg Holzhütter
Joachim Selbig
Mulquiney PJ
Ralf Steuer
Sascha Bulik
Sergio Grimbs
Publication venue: Nature Publishing Group
Publication date: 01/01/2007
Field of study

The dynamic behavior of metabolic networks is governed by numerous regulatory mechanisms, such as reversible phosphorylation, binding of allosteric effectors or temporal gene expression, by which the activity of the participating enzymes can be adjusted to the functional requirements of the cell. For most of the cellular enzymes, such regulatory mechanisms are at best qualitatively known, whereas detailed enzyme-kinetic models are lacking. To explore the possible dynamic behavior of metabolic networks in cases of lacking or incomplete enzyme-kinetic information, we present a computational approach based on structural kinetic modeling. We derive statistical measures for the relative impact of enzyme-kinetic parameters on dynamic properties (such as local stability) and apply our approach to the metabolism of human erythrocytes. Our findings show that allosteric enzyme regulation significantly enhances the stability of the network and extends its potential dynamic behavior. Moreover, our approach allows to differentiate quantitatively between metabolic states related to senescence and metabolic collapse of the human erythrocyte. We think that the proposed method represents an important intermediate step on the long way from topological network analysis to detailed kinetic modeling of complex metabolic networks

Crossref

SLocX: Predicting Subcellular Localization of Arabidopsis Proteins Leveraging Gene Expression Data

Author: Childs Liam
Giorgi Federico M.
Lohse Marc
Lude Anja
Ryngajllo Malgorzata
Selbig Joachim
Usadel Björn
Publication venue: Frontiers Research Foundation
Publication date: 01/01/2011
Field of study

Despite the growing volume of experimentally validated knowledge about the subcellular localization of plant proteins, a well performing in silico prediction tool is still a necessity. Existing tools, which employ information derived from protein sequence alone, offer limited accuracy and/or rely on full sequence availability. We explored whether gene expression profiling data can be harnessed to enhance prediction performance. To achieve this, we trained several support vector machines to predict the subcellular localization of Arabidopsis thaliana proteins using sequence derived information, expression behavior, or a combination of these data and compared their predictive performance through a cross-validation test. We show that gene expression carries information about the subcellular localization not available in sequence information, yielding dramatic benefits for plastid localization prediction, and some notable improvements for other compartments such as the mitochondrion, the Golgi, and the plasma membrane. Based on these results, we constructed a novel subcellular localization prediction engine, SLocX, combining gene expression profiling data with protein sequence-based information. We then validated the results of this engine using an independent test set of annotated proteins and a transient expression of GFP fusion proteins. Here, we present the prediction framework and a website of predicted localizations for Arabidopsis. The relatively good accuracy of our prediction engine, even in cases where only partial protein sequence is available (e.g., in sequences lacking the N-terminal region), offers a promising opportunity for similar application to non-sequenced or poorly annotated plant species. Although the prediction scope of our method is currently limited by the availability of expression information on the ATH1 array, we believe that the advances in measuring gene expression technology will make our method applicable for all Arabidopsis proteins

Frontiers - Publisher Connector

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Haplotype inference from unphased SNP data in heterozygous polyploids based on SAT

Author: Achenbach Ute
Basekow Rico
Diehl Svenja
Gebhardt Christiane
Gyetvai Gabor
Kersten Birgit
Neigenfind Jost
Selbig Joachim
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Haplotype inference based on unphased SNP markers is an important task in population genetics. Although there are different approaches to the inference of haplotypes in diploid species, the existing software is not suitable for inferring haplotypes from unphased SNP data in polyploid species, such as the cultivated potato (<it>Solanum tuberosum</it>). Potato species are tetraploid and highly heterozygous. Results Here we present the software SATlotyper which is able to handle polyploid and polyallelic data. SATlo-typer uses the Boolean satisfiability problem to formulate Haplotype Inference by Pure Parsimony. The software excludes existing haplotype inferences, thus allowing for calculation of alternative inferences. As it is not known which of the multiple haplotype inferences are best supported by the given unphased data set, we use a bootstrapping procedure that allows for scoring of alternative inferences. Finally, by means of the bootstrapping scores, it is possible to optimise the phased genotypes belonging to a given haplotype inference. The program is evaluated with simulated and experimental SNP data generated for heterozygous tetraploid populations of potato. We show that, instead of taking the first haplotype inference reported by the program, we can significantly improve the quality of the final result by applying additional methods that include scoring of the alternative haplotype inferences and genotype optimisation. For a sub-population of nineteen individuals, the predicted results computed by SATlotyper were directly compared with results obtained by experimental haplotype inference via sequencing of cloned amplicons. Prediction and experiment gave similar results regarding the inferred haplotypes and phased genotypes. Conclusion Our results suggest that Haplotype Inference by Pure Parsimony can be solved efficiently by the SAT approach, even for data sets of unphased SNP from heterozygous polyploids. SATlotyper is freeware and is distributed as a Java JAR file. The software can be downloaded from the webpage of the GABI Primary Database at <url>http://www.gabipd.org/projects/satlotyper/</url>. The application of SATlotyper will provide haplotype information, which can be used in haplotype association mapping studies of polyploid plants.</p

Crossref